Universality, limits and predictability of gold-medal performances at the Olympic 

Games 
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Inspired by the Gaines held in ancient Greece, modern Olympics represent the world's largest 
pageant of athletic skill and competitive spirit. Performances of athletes at the Olympic Games 
mirror, since 1896, human potentialities in sports, and thus provide an optimal source of informa- 
tion for studying the evolution of sport achievements and predicting the limits that athletes can 
reach. Unfortunately, the models introduced so far for the description of athlete performances at 
the Olympics are either sophisticated or unrealistic, and more importantly, do not provide a unified 
theory for sport performances. Here, we address this issue by showing that relative performance 
improvements of medal winners at the Olympics are normally distributed, implying that the evo- 
lution of performance values can be described in good approximation as an exponential approach 
to an a priori unknown limiting performance value. This law holds for all specialties in athletics 
- including running, jumping and throwing - and swimming. We present a self-consistent method, 
based on normality hypothesis testing, able to predict limiting performance values in all specialties. 
We further quantify the most likely years in which athletes will breach challenging performance 
walls in running, jumping, throwing and swimming events, as well as the probability that new world 
records will be established at the next edition of the Olympic Games. 



Introduction 

Modern Olympics are inspired by the ancient version of 
the Games, but based on a wider idea of globahty. While 
ancient Games were opened only to Greek speaking ath- 
letes [T], modern Olympics were, since their beginning, 
considered a world event involving people from every part 
of the globe ^. The same symbol of the Olympics, com- 
posed of five interlocking rings standing for the five con- 
tinents, was designed by the Baron Pierre de Coubertin, 
the founder of the modern Olympic Games, with the aim 
of reinforcing the idea that the Games are an interna- 
tional event and welcome all countries of the world [3]. 
Since Athens 1896, 26 editions of the event has been or- 
ganized in different locations around the world, and, from 
the 241 participants representing 14 nations of the first 
edition, the Games have grown to about 10, 500 com- 
petitors from 204 countries at the latest edition of the 
summer Games of Beijing 2008. The Olympics are one 
the most important events worldwide not only for sports, 
but also for politics and society. Many important facts 
of the last century history, such as the Nazism ^, the 
Israeli-Palestinian conflict [5 , and the cold war [6] , have 
influenced the regular organization of the Games. Also, 
the Olympics generally play a fundamental and positive 
role for the economic and urban development of the city 
that hosts the event [H [8] . 

Performance data of athletes at the Olympics are avail- 
able for each modern edition of the Games organized 
so far, and represent an optimal proxy for the study of 
human limits in sport performances for three main rea- 
sons: (i) Data cover more than a century of sport perfor- 
mances since the first edition of the Olympics dates back 
to 1896; (ii) Olympic data provide a detailed record of 
sports performances at regular 4-year intervals; (iii) The 
performances of Olympic medalists truly reflect the best 



achievements that could be obtained in a given historic 
moment because, in the vast majority of sport disciplines, 
the Games have always represented the most important 
event during the career of an athlete, and consequently 
all the greatest athletes have always taken part in the 
Olympics. 

Latest years have witnessed the appearance of a large 
number of statistical studies of data coming from profes- 
sional sports. Examples include basketball [51[TU], base- 
ball [HHIl], soccer [16], tennis [17], etc. Also Olympic 
performance data have been the subject of many analy- 
ses [T51I28| . Some of them focused on models aimed at 
the description of performance progression along time, in- 
cluding linear models [24, that can even lead to unrealis- 
tic results [ini 130] J S-shaped curves [5S] and logistic func- 
tions TT . Others studied statistical properties of perfor- 
mance patterns, such as the power-law relation between 
time (or speed) and length of running events [IHl HH [H] ■ 
In addition, performance data of athletes at the Olympics 
have been used to tune the parameters of complicated 
models aimed at the determination of physiological lim- 
its in sport performances |31H33| . For example, according 
to a mathematical model for human running performance 
that accounts for various energetic factors, such as capac- 
ity of anaerobic metabolism, maximal aerobic power and 
reduction in peak aerobic power, Perronet and Thibault 
predicted the limiting times that athletes can reach in 
various running events in athletics |32j . 
In spite of the numerous efforts however, we still miss a 
general description for the performances of athletes. We 
still miss a universal way to predict limiting performance 
values and calculate the probability of future achieve- 
ments in sport. In this paper, we address all these issues 
by generating a simple and coherent picture for the de- 
scription of the performances obtained by Olympic medal 
winners in all specialties of athletics and swimming. We 



analyze historic performance data and provide empirical 
evidence about the discovery of a novel statistical law 
governing performances of medal winners at the Olympic 
Games. With a self-consistent approach we simultane- 
ously (i) show that performance improvements obey a 
universal law, (ii) estimate limiting performance values, 
(iii) predict future achievements at the Olympics. 



Results 

While former statistical studies have mainly analyzed 
the progression of absolute performance values along the 
various editions of the Games, here we change point of 
view and focus our attention on relative improvements 
in performances between two consecutive editions of the 
Olympics. Let us indicate with py the value of the per- 
formance obtained by the gold medalist in a specific spe- 
cialty at the edition of year y of the Olympic Games. De- 
pending on the specialty, py may indicate time (running 
and swimming), length (long and triple jumps), height 
(high jump and pole vault), or distance (discus and ham- 
mer throws, shot put). We define the relative improve- 
ment of the gold- medal performance in the Games of year 
y with respect to the gold-medal performance in the pre- 
vious edition of the Olympics as 



Cy : = (Apy_4 - Apy) //^py. 



(1) 



where Apj, — Py — Poo represents the gap between the 
performance value of the gold medalist in year y and the 
asymptotic performance value poo- The asymptotic or 
limiting performance value Poo is a unknown parameter 
representing the physiological limit that can be achieved 
in the specialty by an athlete. Eq. [T] defines the relative 
improvement towards the asymptotic performance value 
of the gold medalist in year y with respect to the perfor- 
mance of the gold medalist in year y — 4. Note that the 
same definition can be used for the measurement of the 
relative improvements of silver and bronze medalists, 
and in principle for athletes who have reached any 
arbitrary rank position. 

For reasonable values of Poo, we find that the distribution 
of the relative performance improvements is statistically 
consistent with a normal distribution. We determine the 
best estimate of the asymptotic performance value Poo 
as the value of pao for which the statistical significance 
(p-value) of the normal fit is maximized (see Materials 
and Methods section). The procedure is generally 
accurate and allows us to identify reasonable values 
of Poo in all specialties considered in this study. In 
Fig. 1 for example, we report the results obtained by 
analyzing performance data of male athletes in 400 
meters sprint. The best estimate of the asymptotic time 
is Poo = 41.62 seconds. For this value of poo, we find 
that relative improvements obey a normal distribution 
with average value fi — 0.06 and standard deviation 
a — 0.19. Statistical significance, however, can be used 
not only for the determination of the best estimate 



of the asymptotic performance value, but also, in a 
broader sense, to define confidence intervals for poo- 
In the case of 400 meters sprint of male athletes for 
example, we find that, at 5% significance level, poo is in 
the range 31.03 to 43.09 seconds. At 50% significance 
level, the interval is restricted and Poo is in the range 
38.91 to 42.74 seconds, while, at 95% significance level. 
Poo is expected to be between 41.04 and 42.13 seconds. 
The results shown in Fig. 1 are obtained by analyzing 
the relative performance improvements of gold-medal 
winners. Similar results are, however, obtained when 
considering the performances of silver and bronze medal 
medalists (Fig. SI). Interestingly, the finiteness of the 
data does not affect the reliability of the best estimate of 
the limiting performance value since compatible values 
of Poo can be detected by removing results of the latest 
editions of the Games from the analysis (Fig. S2). 
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Figure 1: Performances of male gold medalists in 400 me- 
ters sprint, a. Best estimate of the asymptotic performance 
value. For each value of Poo lower than the actual Olympic 
record, we evaluate the goodness of the fit of performance 
improvements with a normal distribution, poc is determined 
as the value of the asymptotic time poo that maximizes the 
statistical significance (p- value). For men 400 meters sprint, 
our best estimate is Poo = 41.62 seconds, where we find that 
relative performance improvements are normally distributed 
with a confidence of 98%. For this value of poo, the best em- 
pirical estimates of the average value and standard deviation 
are respectively p, — 0.06 and a = 0.19. b. The cumulative 
distribution function of the z-scores obtained for poo ~ Pao 
(red curve) is compared with the standard normal cumula- 
tive distribution (black curve) . c. Normal sample quantile are 
plotted against normal theoretical quantiles [5T] . The dashed 
line corresponds to the theoretically expected behavior in case 
of a perfect agreement between sample and theoretical distri- 
butions, d. ^-scores of relative performance improvements 
between consecutive editions of the Games. 



The normality of the relative improvements towards the 
asymptotic performance value is a simple and strong 
result. At each new edition of the Games, gold-medal 
performances get, on average, closer to the limiting 
performance value. The average positive improvement 
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Figure 2: Statistical properties of performance improvements 
in athletics. In the main panels we show the determination 
of the best estimate poo of the asymptotic performance value, 
while in the insets we provide a graphical comparison between 
the sample cumulative distributions (red line) and the stan- 
dard normal cumulative distribution (black line), a and b. 
We report the results obtained by the analysis of the perfor- 
mances of male athletes in marathon {pao = 5, 771 AA seconds, 
p- value = 0.58) and female athletes in long jump (pao = 8.12 
meters, p- value = 0.34). c and d. We show the outcome 
of our method for performances of men and women in 100 
meters sprint (respectively, poo = 8.28 seconds and p- value 
= 0.64, Poo = 9.72 seconds and p- value = 0.97). 



observed in historic performance data can be motivated 
by several factors: as time goes on, athletes are becoming 
more professionals, better trained, and during the season 
have more events to participate in; the pool for the 
selection of athletes grows with time, and, consequently 
there is a higher level of competition; the evolution of 
technical materials favors better performances. On the 
other hand, there is also a non null probability that 
winning performances become worse than those ob- 
tained in the previous edition of the Games (i.e., relative 
improvement values are negative). All these possibilities 
are described by a Gaussian distribution that accounts 
for various, in principle hardly quantifiable, factors that 
may influence athlete performances: meteorological 
and geographical conditions, athletic skills and physical 
condition of the participants, etc. The accuracy of the 
normal fit is not only testified by its high statistical 
significance, but also by graphical comparisons between 
the sample distribution and the theoretical normal 
distribution (see Figs, lb and c). It is also important to 
note that the values of the relative improvements do not 
depend on the particular edition of the Games, and thus 
their distribution is stationary (Fig. Id). The strength 
of our results, however, is not only in the significance 
of the fits, but especially in its generality. We repeated 
the same type of analysis for a total of 55 different 
specialties, and found that performance improvements 
are governed by a universal law. First of all, the law 
holds for all running events in athletics. This is valid for 
an heterogeneous set of running distances ranging from 



100 to 42, 195 meters (marathon. Fig. 2 and Supporting 
Information SI). Second, our analysis suggests that 
relative improvements are normally distributed not only 
when considering time performances, but also perfor- 
mances regarding length or height (jumps) and distance 
(throws). In Fig. 2b for example, we report the outcome 
of our method when applied to performance data of 
female gold medalists in long jump. Other examples can 
be found in Supporting Information S2. Finally, the 
law is valid for performance improvements of athletes in 
swimming specialties (Supporting Information S3). 
Given the attention received in the recent 
past [231 [211 [3D], we reserve a special consideration 
to the comparison in performances between female and 
male athletes in 100 meters sprint. In Fig. 2c and 2d, 
we report the results obtained through the analysis of 
Olympic performances in this specialty. According to 
our analysis, the best estimate of the limiting time for 
males is Poo — 8.28 seconds, while for females we identify 
the best estimate for the asymptotic time at Poo = 9.72 
seconds. Our statistical analysis predicts that women 
will be always slower than men and that the gap will 
saturate at about 14%, consistent with the estimation 
by Sparling et al [2U] but in disagreement with what 
predicted by the unrealistic model of Atkinson et al [24] . 
It should be noted that for women the statistical signifi- 
cance is less predictive than the one measured for men. 
While for men we observe that statistical significance is 
clearly peaked around Poo and goes rapidly to zero as 
Poo decreases, the same does not happen in the case of 
women. We believe that the statistics are less accurate 
because the analysis is based on 19 editions instead of 26 
since women started to run the 100 meters sprint only 
in Amsterdam 1928, while men already in Athens 1896. 
In particular, the lack of sufficient data provides high 
statistical significance also for the unrealistic poo = 
seconds. We expect, however, that the future addition 
of more data point will suppress this effect. Despite 
these problems, our analysis still produces meaningful 
estimates of the upper bound of the asymptotic time: at 
5% significance level, the asymptotic value is expected 
to be lower than 10.31 seconds, while at 50% significance 
level. Poo should be lower than 10.17 seconds. Also, 
our best estimates of the limiting performance values 
are probably not as accurate for this specialty (or other 
short distances) because there is not enough reliable 
performance data regarding the first editions of the 
Games (automatic time was introduced in Mexico City 
1968). The removal of data points for male 100 meters 
sprint before Amsterdam 1928 (and in general of a few 
data points from the entire time serie) leads also to 
the impossibility to determine the best estimate of the 
asymptotic time as a global maximum of statistical 
significance (see Fig. S3). For 100 meters sprint, we have 
performed therefore an additional analysis in which we 
aggregated together the results of gold, silver and bronze 
medalists and obtained slightly different estimates for the 
limiting performance values \pao = 8.80 seconds for men 



(Fig. S4) and poo — 9.64 seconds for women (Fig. S5-S6)]. 
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Figure 3: Scaling law between asymptotic time and running 
length, and prediction of performances at future editions of the 
Olympic Games, a. Relation between the best estimates of 
the limiting performance value poo and the length i of the race 
for men running events in athletics (red circles) . We excluded 
from the analysis relay and hurdles events. We find that poo ~ 
1°'°° , and the best estimate of the power-law exponent is Ooo = 
1.10±0. 02 (black line), b. Probability density functions of the 
winning time for the men 400 meters sprint in future editions 
of the Games. The dashed line represents the winning time 
in the latest edition of the Olympics in Beijing 2008. This 
value is used as initial condition for the prediction of future 
performances, c. The probability density of the winning time 
in men 400 meters predicted by our model is compared to past 
performance data (black circles). The density plot is obtained 
by convoluting the various prediction curves derived from real 
data. d. Probability that athletes will breach challenging 
walls in various specialties of athletics as a function of time. 



In general, our approach produces good results for spe- 
cialties with a sufficiently long tradition in the Games. 
This is basically the case of all male specialties in ath- 
letics. Data about female performances typically pro- 
vide less accurate results, but still, in the majority of the 
cases, the predictions of the asymptotic performance val- 
ues are reasonable. We summarize in Table 1 the results 
obtained for some specialties, while we refer to the Sup- 
plementary Information for a systematic analysis of all of 
them. It should be noted that there are also a few cases 
in which things do not work perfectly. In women 800 me- 
ters, for example, statistical significance does not exhibit 
any peak value (Supporting Information SI). There are 
also a few specialties in which the best estimate of the 
limiting performance value does not correspond to the 
global maximum of statistical significance (Supporting 
Information SI). In these cases, statistical significance is 
a non monotonic function of the p^o and more maxima 
are present. Still the peak value that appears more plau- 
sible can be used as an estimate of Poo ■ Finally, there 
are three specialties in athletics in which a clear peak in 
statistical significance is visible only by excluding perfor- 
mance data of Sidney 2000, but this exclusion is fully jus- 



tified by the fact that the top athletes of the moment did 
not take part in the competition (Supporting Informa- 
tion SI). For example, about the men 200 meters sprint 
of Sidney 2000, the web site sports-reference.com re- 
ports: "This race was expected to be between the Amer- 
icans Maurice Greene and Michael Johnson. Greene was 
the best in the world at 100 meters and Johnson at 400 
meters, and their race in the middle distance was highly 
anticipated. But neither qualified for the team at the 
Olympic Trials, succumbing to minor injuries, although 
they both made the team in their better events." 
The good accuracy of our best estimates of the limit- 
ing performance values is supported also by the power- 
law relation between these quantities and the length of 
the running events in athletics (see Fig. 3a). As already 
observed by Katz and Katz, world record times (pwr) 
and running distances {£) are related by the power-law 
relation p^r ~ ^" [21] ■ Katz and Katz studied the re- 
lation between world record performances and running 
distances in various epochs, and found that the power- 
law exponent value a is always slightly larger than 1.1 
but decreases for more recent epochs. For example, they 
measured a ~ 1.14 in 1925, and a ~ 1.12 in 1995. On the 
basis of our measurements, we claim that the asymptotic 
value of the exponent will be exactly a^o = 1-1, when 
limiting performance values, and thus definitive world 
records, will be reached in all specialties of athletics. 
A final application of our findings is the prediction of 
future performances at the Olympics. The performance 
value of the gold medalist in London 2012, for example, 
can be estimated as P2012 = (P2008 - Poo) [l ~ S.) + Poo 
, where ^ is a random variate extracted from the nor- 
mal distribution M (^; /i, a) with mean value jl and stan- 
dard deviation a. Similar equations can be written also 
to predict performance values of the other editions af- 
ter London 2012. For each future edition of the Games, 
we can draw a distribution of performance values (see 
Fig. 3b). The distribution is normal for the edition of 
2012, but diverges from normality as time grows. In par- 
ticular, while the expected performance value decreases 
exponentially towards the asymptotic performance value 
as time increases, the standard deviation initially grows 
as we move further in future until predictions become 
again more accurate because of the boundary effect of 
Poo (see Fig. 3c). 

By simply looking at the performances expected at the 
next edition of the Games in London 2012, we can ask 
what is the probability that the winner of the gold-medal 
will beat the actual world record of her/his specialty. In 
Table 1, we list these probabilities for some specialties 
together with the most likely performance values that 
gold-medal winners will obtain. In athletics, there are 
not negligible chances (about 30%) that the actual world 
records of 100 meters, 110 meters hurdles and marathon 
will be lowered by men. In swimming specialties, the 
expectations are more promising: there is a good prob- 
ability (higher than 70%) that the world record of 1,500 
meters freestyle will be beaten by male athletes. 



sport 



gender 



specialty 



M 



a p- value E P 



P2012 



100m 8.28 0.04 0.10 0.64 26 0.35 9.63 ± 0.13 



Track & Field 



110m hurdles 


11.76 0.05 0.12 


0.48 26 0.50 12.87 ± 0.14 


400m 


41.62 0.06 0.19 


0.98 26 0.14 43.62 ± 0.41 


10,000m 


1,539 0.05 0.19 


0.45 22 0.01 1,617 ± 15 


marathon 


5,771 0.03 0.15 


0.58 26 0.34 7,537 ± 273 


pole vault 


6.87 0.05 0.08 


0.91 26 0.03 6.00 ± 0.07 


hammer throw 


103.81 0.04 0.09 


0.47 25 0.03 82.89 ± 1.96 


100m 


9.72 0.05 0.19 


0.97 19 0.12 10.73 ± 0.20 


Women 400m 


45.14 0.02 0.15 


0.77 12 0.00 49.53 ± 0.67 



long jump 8.12 0.04 0.18 0.34 16 0.01 7.08 ± 0.19 



Swimming 



Men 



Women 



100m fs 
100m bs 



44.84 
48.98 



0.09 0.10 
0.09 0.11 



0.92 
0.93 



23 0.36 
22 0.24 



47.00 ± 0.24 
52.22 ± 0.39 



100m brs 


57.38 0.16 0.16 


0.93 11 0.36 


58.67 ± 0.24 


1,500m fs 


577 0.05 0.05 


0.50 23 0.71 


866 ± 15 


100m fs 


51.87 0.12 0.19 


0.54 22 0.00 


52.97 ± 0.24 


100m bs 


54.73 0.08 0.14 


0.59 20 0.20 


58.62 ± 0.59 



100m brs 
800m fs 



62.08 
388 



0.13 0.10 
0.05 0.07 



0.86 
0.84 



11 0.15 
11 0.76 



64.77 ± 0.31 
489 ± 7 



Table I: Predictions of gold-medal performances in athletics and swimming. We summarize here some of the results obtained 
with our analysis. We list several specialties in athletics and swimming performed by male and female athletes. For each 
specialty, we report from left to right: the name of the specialty, the best estimates of the asymptotic performance value Px, 
the best estimate of the mean value fi, the best estimate of the standard deviation a, the statistical significance or p-value of 
the test of normality, the number E of Olympic Games that included the specialty, the probability P that the actual world 
record will be beaten in London 2012, and the most likely performance value P2(si2 that gold-medal winners will obtain at the 
next edition of the Olympic Games. For shortness of notation, in swimming specialties we abbreviate "freestyle" with "fs", 
"backstroke" with "bs" , and "breaststroke" with "brs" . The values of pao and P2012 are reported in seconds for running and 
swimming races, and in meters for jumping and throwing events. 



Relevant limits are unlikely to be broken at the next 
Olympics (Fig. 3d). We will have to wait until 2020 in 
order to have a 50% chance that a man will run the 100 
meters in less than 9.50 seconds. For other specialties, 
expectations (probability higher than 50%) are even less 
promising: men will run the 400 meters in less than 43.00 
seconds and the marathon in less than two hours (7, 200 
seconds) only after 2030, women will run the 100 meters 
sprint in less than 10.40 seconds only after 2040, and fi- 
nally the wall of 26 minutes (1,560 seconds) in 10,000 
meters will likely be breached by male athletes only after 
year 2080. 



Discussion 

In conclusion, our paper shows that the performance of 
Olympic medal winners in athletics and swimming obey, 
independently of the type of specialty, a simple universal 
law. If performance improvements are calculated with re- 
spect to an asymptotic performance value, then the rel- 
ative difference between improvements obtained in two 
different editions of the Games is a random variate fol- 
lowing a normal distribution. This is the common prop- 
erty of a broad class of natural phenomena that be de- 



scribed by the theory of biased random walks 34J , such 
as the locomotory movements of organisms responding 
to an external stimulus |35H37j . the activity of spiking 
neurons [35] , the trends of daily temperatures [39] , stock 
prices HHI, capital markets [H], etc. 
The normality of the relative improvements cannot be 
explained in trivial terms, especially in this case where 
the statistics is performed on extremal properties of the 
system. Remember in fact that the performance values 
analyzed here are those obtained by the best athletes of 
a given edition of the Olympics (i.e., potentially the best 
performers on the earth) , and thus it is natural to expect 
that absolute performance values obey statistical laws of 
extremes 142) . More importantly, since the distribution 
is normal, it makes sense to refer to average trajectories 
of top performance values along editions of the Games. 
Our findings in fact allow to say that, on average, the ab- 
solute performance value of top athletes at the Olympics 
gets closer to the limiting performance value in an ex- 
ponential fashion, with a rate of about 5% in athletics 
and 10% in swimming. More in detail, the average tra- 
jectory of the performance value can be described by the 
equation 



{Py) = Pyo 



'Kv~ya) 



(2) 



where yo is an arbitrary initial edition year of the 
Olympics and Py^ is the performance value measured in 
year j/q- Eq.2 can be derived directly from Eq.l and the 
fact that relative improvements are normally distributed 
but only under the assumptions that the edition year 
of the Olympics is considered as a continuous variable 

and that (^^%^) = ^^^^^. Note that this obser- 
vation is important for stressing the difference between 
our fitting procedure and a more straightforward analy- 
sis based on the exponential fit of absolute performance 
values, as the one used to find that the progression of 
world record performances follows a piecewise exponen- 
tial decaying pattern [151 - H5] . Note also that the analysis 
of the only Olympic performances differs from the one of 
world record performances for the following reasons: (i) 
The relative change between two world records, if defined 
in a similar manner as Eq.f , can be only a positive quan- 
tity; (ii) The time difference between two world record 
performances is not a constant, but a random variate by 
itself. Because the number of events in which new world 
records can be established is higher today than it was one 
century ago (and they had been growing in the course of 
the years), in any analysis of the progression of world 
record performances time should be rescaled to account 
for that [i5] . 

The asymptotic performance value Poo is an a priori un- 
known variable whose value can be self-consistently de- 
termined by maximizing the statistical significance of the 
normality fit. ft is particularly important to stress that 
our simple methodology provides good estimates of per- 
formance limits that are in general consistent with those 
obtained through complicated physiological models [5H - 
[55] . For example, Perronet and Thibault predicted that 
the limiting time for men in marathon is 1 hour, 48 min- 
utes and 26 seconds [32]. With our minimalistic model, 
we are able to predict that this limiting time is between 1 
hour, 36 minutes and f f seconds and f hour, 4f minutes 
and 40 seconds (for men marathon the peak of statistical 
significance is wide, see Fig. 2a). At the same time, it 
is also important to stress that our minimalistic analysis 
can also lead to little inconsistencies. For example, the 
best estimates of Poo obtained here state that, asymp- 
totically, the average pace in marathon would be higher 
than the one in f 0, 000 meters. This means that accord- 
ing to our estimates, the first 10, 000 meters in marathon 
would be run in less than 23 minutes, while the entire 
race of 10, 000 meters would be run asymptotically in 
more than 25 minutes. This inconsistency can be par- 
tially explained by the fact that the statistics for 10, 000 
meters is less reliable because based only on 22 events, 
while the one for marathon on the results of 26 editions 
of the Games. In general, it is very important to remark 
that, at the moment, we are able to provide only good 
estimates of the asymptotic performance values because 
such estimates are based on a relatively small set of em- 
pirical data (at best 26 editions of the Olympics), and 
therefore must be taken with a grain of salt. We expect 
in fact that, while the normal law governing performance 



improvements will likely continue to hold, the accuracy 
in the estimation of the asymptotic performance values 
will improve with the addition of more data points in 
the future, starting already from the next edition of the 
Games in London 2012. 



Materials and Methods 

Data set 

Medal lists and results of all editions of the 
Olympic Games have been collected from the 
web sites www.sports-reference.com and 

www.databaseoljrmpics.com. Whenever possible, 

we considered automatic measures of time instead of 
manual ones. We included in our study all results 
obtained in the editions of the modern Olympic Games 
since Athens 1896, but we excluded from the analysis 
data about the so-called "Intercalated" edition of the 
Games held in Athens in 1906. We focused on sports 
classified as "Track & Field" and "Swimming", and 
particularly on specialties of these sports that have 
been performed at least in the latest ten editions of 
the Olympic Games. We compared only performances 
between subsequent editions of the games held at four 
years of difference. We excluded therefore comparisons 
between either the consecutive editions of Stockholm 
1912 and Antwerp 1920 (separated by World War I), 
and those of Berlin 1936 and London 1948 (separated by 
World War II). 

For consistency, we considered only specialties whose 
rules or techniques have not changed during time. For 
example, we excluded javelin throw because of the 
javelin redesign in 1986. We also excluded performances 
in high jump before Mexico City 1968 when athletes 
started for the first time to adopt the modern jump style 
called "Fosbury flop" . 

Data are made available for download at 
f ilrad.homelinux. org/resources. 



Normality test 

The results reported in the paper are based on the 
normality test introduced by Anderson and Dar- 
ling |46| . Given a value of Pooi we compute the best 
estimates of the mean ft and the standard deviation a 



as /i = 1/R Y.y ^v and a = ^1/ [R - 1) E, i^v - A) , 
respectively. The relative improvement fj, is defined 
in Eq.l. R indicates the number of results between 
consecutive editions of the Olympic Games that are in- 
cluded in the analysis. We then compute the z-scores as 
Zy = {^y — /t) /a and rearrange them in ascending order 
such that zi < Z2 < ■ ■ ■ < zji. The Anderson-Darling 
distance is computed with the formula A^ = — i? — 
l/^Ef=i[(2i-l)log<i>(zO + (2(i?-i) + l)log(l-$(z,))], 



where $ (zi) is the standard normal cumulative distri- 
bution function. We further use the modified statistics 
A*2 = yl2 (l + 4/i?-25/i?2), suitable in the case 
in which both the mean and standard deviation are 
estimated from the data as suggested by Stephens [471 . 
We evaluate the goodness of the fit by generating 
10^ random number sequences of length R extracted 
from the standard normal distribution. The statistical 
significance of the normality test (p-value) is calculated 
as the number of artificial sequences whose A*^ is larger 
than the one measured for real data divided by the 
total number of generated sequences. Note that there 
is a trivial monotonic relation between the p-value and 
the Anderson-Darling distance A*"^, and therefore the 
maximum of the p- value corresponds to the minimum of 

We used the normality test by Anderson and Darling 
because this test is considered one of the best empirical 
distribution function statistics for detecting most depar- 
tures from normality, and can be used for testing the 



normality of very small sample sizes [57]. We verified, 
however, the robustness of our results by using other 
standard normality tests, including those based on the 
criteria of Kolmogorov-Smirnov, Cramer- von Mises and 
Shapiro- Wilk ^48j .49J . We also verified the consistency 
of our results with normality tests based on the moments 
of the distributions (see Fig. S6). 

Furthermore, we tested the accuracy of our fitting 
method by implementing a bootstrap procedure j50j . 
and found that our fitting method is able to well recover 
the correct parameter values in artificial sequences 
generated according to our model (see Fig. S7). 
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